Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 17: 158, 2016 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-27059502

RESUMO

BACKGROUND: Existing feature selection methods typically do not consider prior knowledge in the form of structural relationships among features. In this study, the features are structured based on prior knowledge into groups. The problem addressed in this article is how to select one representative feature from each group such that the selected features are jointly discriminating the classes. The problem is formulated as a binary constrained optimization and the combinatorial optimization is relaxed as a convex-concave problem, which is then transformed into a sequence of convex optimization problems so that the problem can be solved by any standard optimization algorithm. Moreover, a block coordinate gradient descent optimization algorithm is proposed for high dimensional feature selection, which in our experiments was four times faster than using a standard optimization algorithm. RESULTS: In order to test the effectiveness of the proposed formulation, we used microarray analysis as a case study, where genes with similar expressions or similar molecular functions were grouped together. In particular, the proposed block coordinate gradient descent feature selection method is evaluated on five benchmark microarray gene expression datasets and evidence is provided that the proposed method gives more accurate results than the state-of-the-art gene selection methods. Out of 25 experiments, the proposed method achieved the highest average AUC in 13 experiments while the other methods achieved higher average AUC in no more than 6 experiments. CONCLUSION: A method is developed to select a feature from each group. When the features are grouped based on similarity in gene expression, we showed that the proposed algorithm is more accurate than state-of-the-art gene selection methods that are particularly developed to select highly discriminative and less redundant genes. In addition, the proposed method can exploit any grouping structure among features, while alternative methods are restricted to using similarity based grouping.


Assuntos
Algoritmos , Modelos Teóricos , Neovascularização da Córnea/diagnóstico , Neovascularização da Córnea/genética , Bases de Dados Genéticas , Regulação da Expressão Gênica , Ontologia Genética , Variação Genética , Infecções por HIV/diagnóstico , Infecções por HIV/genética , Hemoglobinúria/diagnóstico , Hemoglobinúria/genética , Humanos , Melanoma/diagnóstico , Melanoma/genética , Análise em Microsséries , Mieloma Múltiplo/diagnóstico , Mieloma Múltiplo/genética , Tumores Neuroendócrinos/diagnóstico , Tumores Neuroendócrinos/genética , Nevo/diagnóstico , Nevo/genética , Estresse Fisiológico/genética , Viroses/diagnóstico , Viroses/genética
2.
Int J Data Min Bioinform ; 11(4): 392-411, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26336666

RESUMO

Early classification of time series has been receiving a lot of attention recently. In this paper we present a model, which we call the Early Classification Model (ECM), that allows for early, accurate and patient-specific classification of multivariate observations. ECM is comprised of an integration of the widely used Hidden Markov Model (HMM) and Support Vector Machine (SVM) models. It attained very promising results on the datasets we tested it on: in one set of experiments based on a published dataset of response to drug therapy in Multiple Sclerosis patients, ECM used only an average of 40% of a time series and was able to outperform some of the baseline models, which needed the full time series for classification. In the set of experiments tested on a sepsis therapy dataset, ECM was able to surpass the standard threshold-based method and the state-of-the-art method for early classification of multivariate time series.


Assuntos
Biologia Computacional/métodos , Bases de Dados Factuais , Diagnóstico por Computador/métodos , Máquina de Vetores de Suporte , Perfilação da Expressão Gênica , Humanos , Cadeias de Markov , Esclerose Múltipla/tratamento farmacológico , Análise Multivariada , Sepse/classificação , Sepse/diagnóstico
3.
BMC Bioinformatics ; 13: 195, 2012 Aug 08.
Artigo em Inglês | MEDLINE | ID: mdl-22873729

RESUMO

BACKGROUND: Early classification of time series is beneficial for biomedical informatics problems such including, but not limited to, disease change detection. Early classification can be of tremendous help by identifying the onset of a disease before it has time to fully take hold. In addition, extracting patterns from the original time series helps domain experts to gain insights into the classification results. This problem has been studied recently using time series segments called shapelets. In this paper, we present a method, which we call Multivariate Shapelets Detection (MSD), that allows for early and patient-specific classification of multivariate time series. The method extracts time series patterns, called multivariate shapelets, from all dimensions of the time series that distinctly manifest the target class locally. The time series were classified by searching for the earliest closest patterns. RESULTS: The proposed early classification method for multivariate time series has been evaluated on eight gene expression datasets from viral infection and drug response studies in humans. In our experiments, the MSD method outperformed the baseline methods, achieving highly accurate classification by using as little as 40%-64% of the time series. The obtained results provide evidence that using conventional classification methods on short time series is not as accurate as using the proposed methods specialized for early classification. CONCLUSION: For the early classification task, we proposed a method called Multivariate Shapelets Detection (MSD), which extracts patterns from all dimensions of the time series. We showed that the MSD method can classify the time series early by using as little as 40%-64% of the time series' length.


Assuntos
Informática Médica/métodos , Algoritmos , Classificação/métodos , Expressão Gênica , Humanos , Influenza Humana/genética , Influenza Humana/metabolismo , Esclerose Múltipla/tratamento farmacológico , Esclerose Múltipla/genética , Esclerose Múltipla/metabolismo , Análise Multivariada
4.
BMC Med Genomics ; 5: 10, 2012 Apr 12.
Artigo em Inglês | MEDLINE | ID: mdl-22498030

RESUMO

BACKGROUND: Infant birth weight is a complex quantitative trait associated with both neonatal and long-term health outcomes. Numerous studies have been published in which candidate genes (IGF1, IGF2, IGF2R, IGF binding proteins, PHLDA2 and PLAGL1) have been associated with birth weight, but these studies are difficult to reproduce in man and large cohort studies are needed due to the large inter individual variance in transcription levels. Also, very little of the trait variance is explained. We decided to identify additional candidates without regard for what is known about the genes. We hypothesize that DNA methylation differences between individuals can serve as markers of gene "expression potential" at growth related genes throughout development and that these differences may correlate with birth weight better than single time point measures of gene expression. METHODS: We performed DNA methylation and transcript profiling on cord blood and placenta from newborns. We then used novel computational approaches to identify genes correlated with birth weight. RESULTS: We identified 23 genes whose methylation levels explain 70-87% of the variance in birth weight. Six of these (ANGPT4, APOE, CDK2, GRB10, OSBPL5 and REG1B) are associated with growth phenotypes in human or mouse models. Gene expression profiling explained a much smaller fraction of variance in birth weight than did DNA methylation. We further show that two genes, the transcriptional repressor MSX1 and the growth factor receptor adaptor protein GRB10, are correlated with transcriptional control of at least seven genes reported to be involved in fetal or placental growth, suggesting that we have identified important networks in growth control. GRB10 methylation is also correlated with genes involved in reactive oxygen species signaling, stress signaling and oxygen sensing and more recent data implicate GRB10 in insulin signaling. CONCLUSIONS: Single time point measurements of gene expression may reflect many factors unrelated to birth weight, while inter-individual differences in DNA methylation may represent a "molecular fossil record" of differences in birth weight-related gene expression. Finding these "unexpected" pathways may tell us something about the long-term association between low birth weight and adult disease, as well as which genes may be susceptible to environmental effects. These findings increase our understanding of the molecular mechanisms involved in human development and disease progression.


Assuntos
Peso ao Nascer/genética , Metilação de DNA , Adulto , Animais , Biologia Computacional , Feminino , Sangue Fetal/metabolismo , Proteína Adaptadora GRB10/genética , Proteína Adaptadora GRB10/metabolismo , Perfilação da Expressão Gênica , Humanos , Recém-Nascido , Fator de Transcrição MSX1/genética , Fator de Transcrição MSX1/metabolismo , Camundongos , Fenótipo , Placenta/metabolismo , Gravidez , Transcrição Gênica
5.
Mol Biosyst ; 8(1): 381-91, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22101336

RESUMO

UNLABELLED: A grand challenge in the proteomics and structural genomics era is the prediction of protein structure, including identification of those proteins that are partially or wholly unstructured. A number of predictors for identification of intrinsically disordered proteins (IDPs) have been developed over the last decade, but none can be taken as a fully reliable on its own. Using a single model for prediction is typically inadequate because prediction based on only the most accurate model ignores model uncertainty. In this paper, we present an empirical method to specify and measure uncertainty associated with disorder predictions. In particular, we analyze the uncertainty in the reference model itself and the uncertainty in data. This is achieved by training a set of models and developing several meta predictors on top of them. The best meta predictor achieved comparable or better results than any other single model, suggesting that incorporating different aspects of protein disorder prediction is important for the disorder prediction task. In addition, the best meta-predictor had more balanced sensitivity and specificity than any individual model. We also assessed the effects of changes in disorder prediction as a function of changes in the protein sequence. For collections of homologous sequences, we found that mutations caused many of the predicted disordered residues to be flipped to be predicted as ordered residues, while the reverse was observed much less frequently. These results suggest that disorder tendencies are more sensitive to allowed mutations than structure tendencies and the conservation of disorder is indeed less stable than conservation of structure. AVAILABILITY: five meta-predictors and four single models developed for this study will be publicly freely accessible for non-commercial use.


Assuntos
Biologia Computacional/métodos , Dobramento de Proteína , Proteínas/química , Proteínas/metabolismo , Incerteza , Sequência de Aminoácidos , Aminoácidos/metabolismo , Bases de Dados de Proteínas , Modelos Moleculares , Dados de Sequência Molecular , Conformação Proteica , Curva ROC , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...